Derivative pricing using neural networks#

Over the past few years, advancements in artificial intelligence and machine learning have naturally led to the adoption of these techniques in many other fields. Neural networks, for example, have become a highly flexible tool capable of solving a wide variety of problems such as image classification, speech recognition, and language translation.

Today, we also see their application in various industries, including finance. In this notebook, we demonstrate how these models can be used—for instance, to price derivatives. Although derivative pricing is a relatively simple and well-studied problem, it provides an interesting example to illustrate how neural networks can solve financial problems, not only in pricing but also in inverse problems (e.g., model calibration) and risk management.

In this entry, we focus on a particular technique: Physics-Informed Neural Networks (PINNs). As the name suggests, these neural networks are designed to incorporate additional information derived from physical laws or known mathematical relationships. Typically, this is achieved by embedding these laws into the loss function to guide the network toward a solution that satisfies the given differential equation.

What are Physics-Informed Neural Networks (PINNs)?#

First we begin with some definitions. Suppose we wish to solve a partial differential equation (PDE) of the form \( \mathcal{N}[V(\mathbf{x})] = 0,\quad \mathbf{x} \in \Omega, \) where:

  • \(\mathcal{N}\) is a differential operator,

  • \(V(\mathbf{x})\) is the unknown solution,

  • \(\Omega\) is the spatial (and possibly temporal) domain.

In a PINN, we approximate \(V(\mathbf{x})\) with a neural network \(V_\theta(\mathbf{x})\) parameterized by \(\theta\). The loss function is constructed to penalize deviations from both the governing PDE in the domain and the prescribed boundary conditions on \(\partial \Omega\). Formally, the loss can be written as:

\[ \mathcal{L}(\theta) = \mathcal{L}_{\mathrm{PDE}}(\theta) + \mathcal{L}_{\mathrm{BC}}(\theta), \]

with

\[ \mathcal{L}_{\mathrm{PDE}}(\theta) = \frac{1}{N_r} \sum_{i=1}^{N_r} \left| \mathcal{N}[V_\theta(\mathbf{x}_r^i)] \right|^2, \]

and

\[ \mathcal{L}_{\mathrm{BC}}(\theta) = \frac{1}{N_b} \sum_{j=1}^{N_b} \left| V_\theta(\mathbf{x}_b^j) - g(\mathbf{x}_b^j) \right|^2, \]

where:

  • \(\{\mathbf{x}_r^i\}_{i=1}^{N_r}\) are the collocation points in the interior of the domain \(\Omega\),

  • \(\{\mathbf{x}_b^j\}_{j=1}^{N_b}\) are the points on the boundary \(\partial \Omega\) with known values \(g(\mathbf{x}_b^j)\).

In order to train the network, we minimize the loss function using gradient-based optimization techniques. The resulting neural network will approximate the solution to the PDE in the domain \(\Omega\) and satisfy the boundary conditions on \(\partial \Omega\).

The idea is not to overcomplicate things, so we start with the example. First, we import the necessary libraries:

Hide code cell source
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

from torch.autograd import grad
from scipy.stats import norm

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
np.random.seed(42)
torch.manual_seed(42)

Loss functions and the governing equation#

For the problem of pricing options, the fundamental PDE we need to solve is the Black-Scholes equation. This equation is a parabolic PDE that describes the evolution of the price \(V(S,t)\) of an option in terms of time \(t\) and the underlying asset price \(S\). It is given by:

\[ \frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV = 0, \]

where:

  • \(V(S,t)\) is the option price,

  • \(S\) is the price of the underlying asset,

  • \(r\) is the risk-free interest rate,

  • \(\sigma\) is the volatility of the underlying asset.

Our objective is to use this PDE to guide the training of our neural network. Modern machine learning libraries, such as PyTorch, offer automatic differentiation capabilities. For example, torch.autograd.grad efficiently computes the necessary derivatives, such as \(\nabla V(S,t)\), at each training point. In our fomulation, \(\mathcal{L}_{\mathrm{PDE}}(\theta)\) would be represented by the following function:

def pde_loss_f(model, inputs, sigma, r):
    inputs.requires_grad_(True)
    V = model(inputs)

    # First order
    gradients = grad(V, inputs, grad_outputs=torch.ones_like(
        V), create_graph=True)[0]
    dVdt = gradients[:, 0]
    dVdS = gradients[:, 1]

    # Second order
    d2VdS2 = grad(dVdS, inputs, grad_outputs=torch.ones_like(
        dVdS), create_graph=True)[0][:, 1]
    S = inputs[:, 1]
    V = V[:, 0]

    # PDE
    pde_residual = dVdt + 0.5 * sigma ** 2 * S ** 2 * d2VdS2 + r * S * dVdS - r * V
    loss_pde = torch.mean(pde_residual ** 2)
    return loss_pde

Additionally, to incorporate the boundary conditions, we need to define an additional loss functions that force the network to satisfy these constraints. Since the boundary conditions we are using are of Dirichlet type, the loss function is relatively simple (it merely compares the network outputs against the ground truth). However, nothing prevents us from employing other types of boundary conditions (such as Neumann or Robin).

def boundary_loss_f(model, inputs, outputs):
    V_pred = model(inputs)
    boundary_loss = torch.mean((V_pred - outputs) ** 2)
    return boundary_loss

Collocation points#

To solve the PDE using PINNs, we need to define the numerical domain over which the equation will be solved. Much like traditional numerical methods (e.g., finite differences), we select a set of points -called collocation points- where the PDE is enforced. These include points in the interior of the domain \(\Omega\) as well as on its boundary \(\partial\Omega\).

To enforce the boundary conditions, additional loss terms are added that force the network to satisfy them. For a European call option, the typical boundary conditions are:

  • At \(S = 0\): $\( V(0, t) = 0, \)$ since if the underlying asset’s price is zero, the option is worthless.

  • For \(S \to \infty\): $\( V(S, t) \approx S - K e^{-r(T-t)}, \)\( as the option behaves like a linear payoff for large \)S$.

  • At expiration \(t = T\): $\( V(S, T) = \max(S - K,\, 0), \)$ which is simply the payoff of the option.

We sample this points using random numbers, and we will use them to train the neural network. Next is an implementation for sampling this points:

def payoff(s, k):
    return np.maximum(s-k, 0)


def interior_samples(option_config, scaling=4):
    t = np.random.uniform(
        0, option_config['maturity'], option_config['n_samples'])
    s = np.random.uniform(
        0, scaling*option_config['strike'], option_config['n_samples'])
    tag = np.zeros(option_config['n_samples'])
    output = np.zeros(option_config['n_samples'])
    return t, s, tag, output


def top_boundary_samples(option_config, scaling=4):
    t = np.random.uniform(
        0, option_config['maturity'], option_config['n_samples'])
    s = np.ones(option_config['n_samples']) * scaling*option_config['strike']
    strike = option_config['strike']
    s_max = scaling*option_config['strike']
    r = option_config['r']
    T = option_config['maturity']
    output = s_max - strike * np.exp(-r*(T-t))
    tag = np.ones(option_config['n_samples'])
    return t, s, tag, output


def bottom_boundary_samples(option_config):
    t = np.random.uniform(
        0, option_config['maturity'], option_config['n_samples'])
    s = np.zeros(option_config['n_samples'])
    output = np.zeros(option_config['n_samples'])
    tag = np.ones(option_config['n_samples'])
    return t, s, tag, output


def initial_condition_samples(option_config, scaling=4):
    t = np.ones(option_config['n_samples']) * option_config['maturity']
    s = np.random.uniform(
        0, scaling*option_config['strike'], option_config['n_samples'])
    tag = np.ones(option_config['n_samples'])
    output = payoff(s, option_config['strike'])
    return t, s, tag, output

Important

The choice of collocation points is critical for the convergence of the method. One potential improvement is to use quasi-random numbers to distribute the collocation points more uniformly across the domain, thereby avoiding clustering that can occur with some random number generators.

Of course, we need to define the financial parameters that will be used in the problem. For this example, we will use the following values:

config = {'strike': 15, 'maturity': 1,
          'r': 0.04, 'sigma': 0.25, 'n_samples': 10000}

# Generate samples
inner_samples = interior_samples(config)
top_samples = top_boundary_samples(config)
bottom_samples = bottom_boundary_samples(config)
initial_samples = initial_condition_samples(config)

Visually, this is how our domain looks like:

The neural network#

In this exercise, we employ a relatively simple feedforward neural network with one hidden layer, a few neurons, and \(\tanh\) activation functions. While more complex architectures may be required for higher-dimensional problems, empirical evidence suggests that a simple network is sufficient for this particular case.

class PINN(nn.Module):
    def __init__(self):
        super(PINN, self).__init__()
        size = 15
        self.hidden_layers = nn.ModuleList([
            nn.Linear(2, size),
            nn.Linear(size, size),
        ])
        self.output_layer = nn.Linear(size, 1)

    def forward(self, inputs):
        x = inputs
        for layer in self.hidden_layers:
            x = torch.tanh(layer(x))
        x = self.output_layer(x)
        return x

It is also important to initialize the network weights properly, as this can affect the convergence of the solution. Unlike typical machine learning tasks that require large amounts of data, PINNs can often solve PDEs with relatively little (even synthetically generated) data. Therefore, global optimizers such as L-BFGS, which use the entire dataset when updating the weights, are particularly effective.

def initialize_weights(model):
    for layer in model.hidden_layers:
        if isinstance(layer, nn.Linear):
            nn.init.xavier_uniform_(layer.weight)
            nn.init.zeros_(layer.bias)

Training loop#

Now with all set, we can proceed to train the neural network.

def train_model(model, inputs, outputs, tags, sigma, r, iters=1000):
    optimizer = optim.LBFGS(
        model.parameters(),
        max_iter=iters,
        lr=1,
        line_search_fn='strong_wolfe',
        tolerance_grad=1e-10,
        tolerance_change=1e-10
    )

    loss_history = {'total_loss': []}

    def closure():
        optimizer.zero_grad()
        total_loss = 0

        pde_mask = (tags == 0).squeeze()
        boundary_mask = (tags == 1).squeeze()

        if pde_mask.any():
            pde_loss = pde_loss_f(model, inputs[pde_mask], sigma, r)
            total_loss += pde_loss

        if boundary_mask.any():
            boundary_loss = boundary_loss_f(
                model, inputs[boundary_mask], outputs[boundary_mask])
            total_loss += boundary_loss

        print(
            f'PDE Loss: {pde_loss.item()} | Boundary Loss: {boundary_loss.item()}', end='\r', flush=True)
        total_loss.backward()
        loss_history['total_loss'].append(total_loss.item())
        return total_loss
    optimizer.step(closure)
    return loss_history
Hide code cell source
# input samples
inner_s = torch.stack([torch.tensor(x, dtype=torch.float32)
                      for x in inner_samples[0:2]], dim=0).T
top_s = torch.stack([torch.tensor(x, dtype=torch.float32)
                    for x in top_samples[0:2]], dim=0).T
bottom_s = torch.stack([torch.tensor(x, dtype=torch.float32)
                       for x in bottom_samples[0:2]], dim=0).T
initial_s = torch.stack([torch.tensor(x, dtype=torch.float32)
                        for x in initial_samples[0:2]], dim=0).T

# output
inner_output = torch.tensor(inner_samples[3], dtype=torch.float32)
top_output = torch.tensor(top_samples[3], dtype=torch.float32)
bottom_output = torch.tensor(bottom_samples[3], dtype=torch.float32)
initial_output = torch.tensor(initial_samples[3], dtype=torch.float32)

# tags
inner_tags = torch.tensor(inner_samples[2], dtype=torch.float32)
top_tags = torch.tensor(top_samples[2], dtype=torch.float32)
bottom_tags = torch.tensor(bottom_samples[2], dtype=torch.float32)
initial_tags = torch.tensor(initial_samples[2], dtype=torch.float32)

# concatenate
inputs = torch.cat([inner_s, top_s, bottom_s, initial_s], dim=0)
outputs = torch.cat([inner_output, top_output, bottom_output,
                    initial_output], dim=0).reshape(-1, 1)
tags = torch.cat([inner_tags, top_tags, bottom_tags,
                 initial_tags], dim=0).reshape(-1, 1)

# suffle
idx = torch.randperm(inputs.size(0))
inputs = inputs[idx]
outputs = outputs[idx]
tags = tags[idx]
model = PINN().to(device)
initialize_weights(model)
loss_history = train_model(model, inputs, outputs,
                           tags, config['sigma'], config['r'], iters=10000)

Results#

Let’s see how the neural network performs in pricing the option. We will compare the network’s predictions against the true option price obtained from the Black-Scholes formula, but first let’s look at the loss evolution during training.

Benchmark#

How does it compare with the analytical solution? Below, we present a comparison between the analytical solution and the solution obtained via the neural network. As we know, the analytical solution for a European call option is given by

\[ C(S, t) = S\, \Phi(d_1) - K e^{-r(T-t)}\, \Phi(d_2), \]

where

\[ d_1 = \frac{\ln(S/K) + \left(r + \frac{\sigma^2}{2}\right)(T-t)}{\sigma \sqrt{T-t}}, \]
\[ d_2 = d_1 - \sigma \sqrt{T-t}, \]

and \(\Phi\) denotes the cumulative distribution function (CDF) of the standard normal distribution. Below, we show the comparison between the analytical solution and the solution obtained with the neural network.

Hide code cell source
def bs_price(s, k, r, sigma, t):
    d1 = (np.log(s / k) + (r + 0.5 * sigma ** 2) * t) / (sigma * np.sqrt(t))
    d2 = d1 - sigma * np.sqrt(t)
    return s * norm.cdf(d1) - k * np.exp(-r * t) * norm.cdf(d2)


exact_surface = np.zeros_like(surface)
for i in range(len(s)):
    exact_surface[:, i] = bs_price(
        s[i], config['strike'], config['r'], config['sigma'], t)

exact_surface = np.flip(exact_surface.T, axis=1)
/var/folders/cp/l432p0ns1t38qmnyr_3l3r000000gn/T/ipykernel_69285/257154943.py:2: RuntimeWarning:

divide by zero encountered in log

/var/folders/cp/l432p0ns1t38qmnyr_3l3r000000gn/T/ipykernel_69285/257154943.py:2: RuntimeWarning:

divide by zero encountered in divide
Hide code cell source
error = np.linalg.norm(exact_surface - surface, 2) / \
    np.linalg.norm(exact_surface, 2)
print("Error (2-norm): %.2f%%" % (error * 100))
Error (2-norm): 0.15%

We obtain an error of 0.21% against the exact solution without much tweaking! This demonstrates that even for a relatively simple problem such as pricing a European call option, the PINN approach can achieve a decent accuracy. This result is showcases the potential of this technique for more complex financial models, where analytical solutions are not available.